Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
1.
Sci Rep ; 13(1): 4154, 2023 03 13.
Article in English | MEDLINE | ID: covidwho-2249038

ABSTRACT

The rapid spread of the COVID-19 pandemic has resulted in an unprecedented amount of sequence data of the SARS-CoV-2 genome-millions of sequences and counting. This amount of data, while being orders of magnitude beyond the capacity of traditional approaches to understanding the diversity, dynamics, and evolution of viruses, is nonetheless a rich resource for machine learning (ML) approaches as alternatives for extracting such important information from these data. It is of hence utmost importance to design a framework for testing and benchmarking the robustness of these ML models. This paper makes the first effort (to our knowledge) to benchmark the robustness of ML models by simulating biological sequences with errors. In this paper, we introduce several ways to perturb SARS-CoV-2 genome sequences to mimic the error profiles of common sequencing platforms such as Illumina and PacBio. We show from experiments on a wide array of ML models that some simulation-based approaches with different perturbation budgets are more robust (and accurate) than others for specific embedding methods to certain noise simulations on the input sequences. Our benchmarking framework may assist researchers in properly assessing different ML models and help them understand the behavior of the SARS-CoV-2 virus or avoid possible future pandemics.


Subject(s)
Computer Simulation , Genome, Viral , Machine Learning , Research Design , SARS-CoV-2 , Machine Learning/standards , SARS-CoV-2/classification , SARS-CoV-2/genetics , Genome, Viral/genetics , Viral Proteins/genetics , COVID-19/virology , Sequence Analysis, RNA
3.
Crit Care ; 25(1): 328, 2021 09 08.
Article in English | MEDLINE | ID: covidwho-1582035

ABSTRACT

BACKGROUND: The coronavirus disease 2019 (COVID-19) pandemic caused by the SARS-Cov2 virus has become the greatest health and controversial issue for worldwide nations. It is associated with different clinical manifestations and a high mortality rate. Predicting mortality and identifying outcome predictors are crucial for COVID patients who are critically ill. Multivariate and machine learning methods may be used for developing prediction models and reduce the complexity of clinical phenotypes. METHODS: Multivariate predictive analysis was applied to 108 out of 250 clinical features, comorbidities, and blood markers captured at the admission time from a hospitalized cohort of patients (N = 250) with COVID-19. Inspired modification of partial least square (SIMPLS)-based model was developed to predict hospital mortality. Prediction accuracy was randomly assigned to training and validation sets. Predictive partition analysis was performed to obtain cutting value for either continuous or categorical variables. Latent class analysis (LCA) was carried to cluster the patients with COVID-19 to identify low- and high-risk patients. Principal component analysis and LCA were used to find a subgroup of survivors that tends to die. RESULTS: SIMPLS-based model was able to predict hospital mortality in patients with COVID-19 with moderate predictive power (Q2 = 0.24) and high accuracy (AUC > 0.85) through separating non-survivors from survivors developed using training and validation sets. This model was obtained by the 18 clinical and comorbidities predictors and 3 blood biochemical markers. Coronary artery disease, diabetes, Altered Mental Status, age > 65, and dementia were the topmost differentiating mortality predictors. CRP, prothrombin, and lactate were the most differentiating biochemical markers in the mortality prediction model. Clustering analysis identified high- and low-risk patients among COVID-19 survivors. CONCLUSIONS: An accurate COVID-19 mortality prediction model among hospitalized patients based on the clinical features and comorbidities may play a beneficial role in the clinical setting to better management of patients with COVID-19. The current study revealed the application of machine-learning-based approaches to predict hospital mortality in patients with COVID-19 and identification of most important predictors from clinical, comorbidities and blood biochemical variables as well as recognizing high- and low-risk COVID-19 survivors.


Subject(s)
COVID-19/mortality , Hospital Mortality/trends , Machine Learning/standards , Severity of Illness Index , COVID-19/epidemiology , Cohort Studies , Female , Humans , Male , Prognosis , Respiration, Artificial/statistics & numerical data , Risk Assessment/methods , Risk Factors
4.
J Med Internet Res ; 23(2): e20545, 2021 02 19.
Article in English | MEDLINE | ID: covidwho-1573803

ABSTRACT

COVID-19 cases are exponentially increasing worldwide; however, its clinical phenotype remains unclear. Natural language processing (NLP) and machine learning approaches may yield key methods to rapidly identify individuals at a high risk of COVID-19 and to understand key symptoms upon clinical manifestation and presentation. Data on such symptoms may not be accurately synthesized into patient records owing to the pressing need to treat patients in overburdened health care settings. In this scenario, clinicians may focus on documenting widely reported symptoms that indicate a confirmed diagnosis of COVID-19, albeit at the expense of infrequently reported symptoms. While NLP solutions can play a key role in generating clinical phenotypes of COVID-19, they are limited by the resulting limitations in data from electronic health records (EHRs). A comprehensive record of clinic visits is required-audio recordings may be the answer. A recording of clinic visits represents a more comprehensive record of patient-reported symptoms. If done at scale, a combination of data from the EHR and recordings of clinic visits can be used to power NLP and machine learning models, thus rapidly generating a clinical phenotype of COVID-19. We propose the generation of a pipeline extending from audio or video recordings of clinic visits to establish a model that factors in clinical symptoms and predict COVID-19 incidence. With vast amounts of available data, we believe that a prediction model can be rapidly developed to promote the accurate screening of individuals at a high risk of COVID-19 and to identify patient characteristics that predict a greater risk of a more severe infection. If clinical encounters are recorded and our NLP model is adequately refined, benchtop virologic findings would be better informed. While clinic visit recordings are not the panacea for this pandemic, they are a low-cost option with many potential benefits, which have recently begun to be explored.


Subject(s)
Ambulatory Care/standards , COVID-19/genetics , Communications Media/standards , Electronic Health Records/standards , Machine Learning/standards , Natural Language Processing , Humans , Phenotype , SARS-CoV-2
6.
J Med Internet Res ; 23(4): e26211, 2021 04 14.
Article in English | MEDLINE | ID: covidwho-1190246

ABSTRACT

BACKGROUND: The COVID-19 pandemic is probably the greatest health catastrophe of the modern era. Spain's health care system has been exposed to uncontrollable numbers of patients over a short period, causing the system to collapse. Given that diagnosis is not immediate, and there is no effective treatment for COVID-19, other tools have had to be developed to identify patients at the risk of severe disease complications and thus optimize material and human resources in health care. There are no tools to identify patients who have a worse prognosis than others. OBJECTIVE: This study aimed to process a sample of electronic health records of patients with COVID-19 in order to develop a machine learning model to predict the severity of infection and mortality from among clinical laboratory parameters. Early patient classification can help optimize material and human resources, and analysis of the most important features of the model could provide more detailed insights into the disease. METHODS: After an initial performance evaluation based on a comparison with several other well-known methods, the extreme gradient boosting algorithm was selected as the predictive method for this study. In addition, Shapley Additive Explanations was used to analyze the importance of the features of the resulting model. RESULTS: After data preprocessing, 1823 confirmed patients with COVID-19 and 32 predictor features were selected. On bootstrap validation, the extreme gradient boosting classifier yielded a value of 0.97 (95% CI 0.96-0.98) for the area under the receiver operator characteristic curve, 0.86 (95% CI 0.80-0.91) for the area under the precision-recall curve, 0.94 (95% CI 0.92-0.95) for accuracy, 0.77 (95% CI 0.72-0.83) for the F-score, 0.93 (95% CI 0.89-0.98) for sensitivity, and 0.91 (95% CI 0.86-0.96) for specificity. The 4 most relevant features for model prediction were lactate dehydrogenase activity, C-reactive protein levels, neutrophil counts, and urea levels. CONCLUSIONS: Our predictive model yielded excellent results in the differentiating among patients who died of COVID-19, primarily from among laboratory parameter values. Analysis of the resulting model identified a set of features with the most significant impact on the prediction, thus relating them to a higher risk of mortality.


Subject(s)
COVID-19/epidemiology , Laboratories/standards , Machine Learning/standards , Adolescent , Adult , Aged , Aged, 80 and over , Child , Child, Preschool , Female , Humans , Infant , Infant, Newborn , Male , Middle Aged , Pandemics , Prognosis , Reproducibility of Results , Research Design , Retrospective Studies , SARS-CoV-2/isolation & purification , Spain/epidemiology , Treatment Outcome , Young Adult
7.
JMIR Public Health Surveill ; 6(4): e22400, 2020 10 22.
Article in English | MEDLINE | ID: covidwho-1172949

ABSTRACT

BACKGROUND: Racial disparities in health care are well documented in the United States. As machine learning methods become more common in health care settings, it is important to ensure that these methods do not contribute to racial disparities through biased predictions or differential accuracy across racial groups. OBJECTIVE: The goal of the research was to assess a machine learning algorithm intentionally developed to minimize bias in in-hospital mortality predictions between white and nonwhite patient groups. METHODS: Bias was minimized through preprocessing of algorithm training data. We performed a retrospective analysis of electronic health record data from patients admitted to the intensive care unit (ICU) at a large academic health center between 2001 and 2012, drawing data from the Medical Information Mart for Intensive Care-III database. Patients were included if they had at least 10 hours of available measurements after ICU admission, had at least one of every measurement used for model prediction, and had recorded race/ethnicity data. Bias was assessed through the equal opportunity difference. Model performance in terms of bias and accuracy was compared with the Modified Early Warning Score (MEWS), the Simplified Acute Physiology Score II (SAPS II), and the Acute Physiologic Assessment and Chronic Health Evaluation (APACHE). RESULTS: The machine learning algorithm was found to be more accurate than all comparators, with a higher sensitivity, specificity, and area under the receiver operating characteristic. The machine learning algorithm was found to be unbiased (equal opportunity difference 0.016, P=.20). APACHE was also found to be unbiased (equal opportunity difference 0.019, P=.11), while SAPS II and MEWS were found to have significant bias (equal opportunity difference 0.038, P=.006 and equal opportunity difference 0.074, P<.001, respectively). CONCLUSIONS: This study indicates there may be significant racial bias in commonly used severity scoring systems and that machine learning algorithms may reduce bias while improving on the accuracy of these methods.


Subject(s)
Forecasting/methods , Hospital Mortality , Machine Learning/standards , APACHE , Adult , Aged , Algorithms , Cohort Studies , Early Warning Score , Electronic Health Records/statistics & numerical data , Female , Humans , Machine Learning/statistics & numerical data , Male , Middle Aged , Retrospective Studies , Simplified Acute Physiology Score
8.
Int J Qual Health Care ; 33(1)2021 Mar 04.
Article in English | MEDLINE | ID: covidwho-1066349

ABSTRACT

Federated learning (FL) as a distributed machine learning (ML) technique has lately attracted increasing attention of healthcare stakeholders as FL is perceived as a promising decentralized approach to address data privacy and security concerns. The FL approach stores and maintains the privacy-sensitive data locally while allows multiple sites to train ML models collaboratively. We aim to describe the most recent real-world cases using the FL in both COVID-19 and non-COVID-19 scenarios and also highlight current limitations and practical challenges of FL.


Subject(s)
COVID-19/epidemiology , Computer Security/statistics & numerical data , Confidentiality/standards , Electronic Health Records/organization & administration , Machine Learning/standards , Electronic Health Records/standards , Humans , SARS-CoV-2
9.
J Med Internet Res ; 22(12): e24048, 2020 12 02.
Article in English | MEDLINE | ID: covidwho-1024476

ABSTRACT

BACKGROUND: Conventional diagnosis of COVID-19 with reverse transcription polymerase chain reaction (RT-PCR) testing (hereafter, PCR) is associated with prolonged time to diagnosis and significant costs to run the test. The SARS-CoV-2 virus might lead to characteristic patterns in the results of widely available, routine blood tests that could be identified with machine learning methodologies. Machine learning modalities integrating findings from these common laboratory test results might accelerate ruling out COVID-19 in emergency department patients. OBJECTIVE: We sought to develop (ie, train and internally validate with cross-validation techniques) and externally validate a machine learning model to rule out COVID 19 using only routine blood tests among adults in emergency departments. METHODS: Using clinical data from emergency departments (EDs) from 66 US hospitals before the pandemic (before the end of December 2019) or during the pandemic (March-July 2020), we included patients aged ≥20 years in the study time frame. We excluded those with missing laboratory results. Model training used 2183 PCR-confirmed cases from 43 hospitals during the pandemic; negative controls were 10,000 prepandemic patients from the same hospitals. External validation used 23 hospitals with 1020 PCR-confirmed cases and 171,734 prepandemic negative controls. The main outcome was COVID 19 status predicted using same-day routine laboratory results. Model performance was assessed with area under the receiver operating characteristic (AUROC) curve as well as sensitivity, specificity, and negative predictive value (NPV). RESULTS: Of 192,779 patients included in the training, external validation, and sensitivity data sets (median age decile 50 [IQR 30-60] years, 40.5% male [78,249/192,779]), AUROC for training and external validation was 0.91 (95% CI 0.90-0.92). Using a risk score cutoff of 1.0 (out of 100) in the external validation data set, the model achieved sensitivity of 95.9% and specificity of 41.7%; with a cutoff of 2.0, sensitivity was 92.6% and specificity was 59.9%. At the cutoff of 2.0, the NPVs at a prevalence of 1%, 10%, and 20% were 99.9%, 98.6%, and 97%, respectively. CONCLUSIONS: A machine learning model developed with multicenter clinical data integrating commonly collected ED laboratory data demonstrated high rule-out accuracy for COVID-19 status, and might inform selective use of PCR-based testing.


Subject(s)
COVID-19/diagnosis , Emergency Service, Hospital , Hematologic Tests/methods , Machine Learning/standards , Adult , Aged , Area Under Curve , Female , Hospitals , Humans , Laboratories , Male , Middle Aged , Pandemics , ROC Curve , Reproducibility of Results , SARS-CoV-2 , Sensitivity and Specificity
10.
J Med Internet Res ; 22(11): e24018, 2020 11 06.
Article in English | MEDLINE | ID: covidwho-979821

ABSTRACT

BACKGROUND: COVID-19 has infected millions of people worldwide and is responsible for several hundred thousand fatalities. The COVID-19 pandemic has necessitated thoughtful resource allocation and early identification of high-risk patients. However, effective methods to meet these needs are lacking. OBJECTIVE: The aims of this study were to analyze the electronic health records (EHRs) of patients who tested positive for COVID-19 and were admitted to hospitals in the Mount Sinai Health System in New York City; to develop machine learning models for making predictions about the hospital course of the patients over clinically meaningful time horizons based on patient characteristics at admission; and to assess the performance of these models at multiple hospitals and time points. METHODS: We used Extreme Gradient Boosting (XGBoost) and baseline comparator models to predict in-hospital mortality and critical events at time windows of 3, 5, 7, and 10 days from admission. Our study population included harmonized EHR data from five hospitals in New York City for 4098 COVID-19-positive patients admitted from March 15 to May 22, 2020. The models were first trained on patients from a single hospital (n=1514) before or on May 1, externally validated on patients from four other hospitals (n=2201) before or on May 1, and prospectively validated on all patients after May 1 (n=383). Finally, we established model interpretability to identify and rank variables that drive model predictions. RESULTS: Upon cross-validation, the XGBoost classifier outperformed baseline models, with an area under the receiver operating characteristic curve (AUC-ROC) for mortality of 0.89 at 3 days, 0.85 at 5 and 7 days, and 0.84 at 10 days. XGBoost also performed well for critical event prediction, with an AUC-ROC of 0.80 at 3 days, 0.79 at 5 days, 0.80 at 7 days, and 0.81 at 10 days. In external validation, XGBoost achieved an AUC-ROC of 0.88 at 3 days, 0.86 at 5 days, 0.86 at 7 days, and 0.84 at 10 days for mortality prediction. Similarly, the unimputed XGBoost model achieved an AUC-ROC of 0.78 at 3 days, 0.79 at 5 days, 0.80 at 7 days, and 0.81 at 10 days. Trends in performance on prospective validation sets were similar. At 7 days, acute kidney injury on admission, elevated LDH, tachypnea, and hyperglycemia were the strongest drivers of critical event prediction, while higher age, anion gap, and C-reactive protein were the strongest drivers of mortality prediction. CONCLUSIONS: We externally and prospectively trained and validated machine learning models for mortality and critical events for patients with COVID-19 at different time horizons. These models identified at-risk patients and uncovered underlying relationships that predicted outcomes.


Subject(s)
Coronavirus Infections/diagnosis , Coronavirus Infections/mortality , Machine Learning/standards , Pneumonia, Viral/diagnosis , Pneumonia, Viral/mortality , Acute Kidney Injury/epidemiology , Adolescent , Adult , Aged , Aged, 80 and over , Betacoronavirus , COVID-19 , Cohort Studies , Electronic Health Records , Female , Hospital Mortality , Hospitalization/statistics & numerical data , Hospitals , Humans , Male , Middle Aged , New York City/epidemiology , Pandemics , Prognosis , ROC Curve , Risk Assessment/methods , Risk Assessment/standards , SARS-CoV-2 , Young Adult
11.
J Med Internet Res ; 22(11): e23128, 2020 11 11.
Article in English | MEDLINE | ID: covidwho-976118

ABSTRACT

BACKGROUND: Patients with COVID-19 in the intensive care unit (ICU) have a high mortality rate, and methods to assess patients' prognosis early and administer precise treatment are of great significance. OBJECTIVE: The aim of this study was to use machine learning to construct a model for the analysis of risk factors and prediction of mortality among ICU patients with COVID-19. METHODS: In this study, 123 patients with COVID-19 in the ICU of Vulcan Hill Hospital were retrospectively selected from the database, and the data were randomly divided into a training data set (n=98) and test data set (n=25) with a 4:1 ratio. Significance tests, correlation analysis, and factor analysis were used to screen 100 potential risk factors individually. Conventional logistic regression methods and four machine learning algorithms were used to construct the risk prediction model for the prognosis of patients with COVID-19 in the ICU. The performance of these machine learning models was measured by the area under the receiver operating characteristic curve (AUC). Interpretation and evaluation of the risk prediction model were performed using calibration curves, SHapley Additive exPlanations (SHAP), Local Interpretable Model-Agnostic Explanations (LIME), etc, to ensure its stability and reliability. The outcome was based on the ICU deaths recorded from the database. RESULTS: Layer-by-layer screening of 100 potential risk factors finally revealed 8 important risk factors that were included in the risk prediction model: lymphocyte percentage, prothrombin time, lactate dehydrogenase, total bilirubin, eosinophil percentage, creatinine, neutrophil percentage, and albumin level. Finally, an eXtreme Gradient Boosting (XGBoost) model established with the 8 important risk factors showed the best recognition ability in the training set of 5-fold cross validation (AUC=0.86) and the verification queue (AUC=0.92). The calibration curve showed that the risk predicted by the model was in good agreement with the actual risk. In addition, using the SHAP and LIME algorithms, feature interpretation and sample prediction interpretation algorithms of the XGBoost black box model were implemented. Additionally, the model was translated into a web-based risk calculator that is freely available for public usage. CONCLUSIONS: The 8-factor XGBoost model predicts risk of death in ICU patients with COVID-19 well; it initially demonstrates stability and can be used effectively to predict COVID-19 prognosis in ICU patients.


Subject(s)
COVID-19/epidemiology , Machine Learning/standards , Algorithms , Female , Humans , Intensive Care Units , Male , Prognosis , Reproducibility of Results , Retrospective Studies , Risk Factors
12.
J Med Internet Res ; 22(11): e24225, 2020 11 09.
Article in English | MEDLINE | ID: covidwho-930817

ABSTRACT

BACKGROUND: Prioritizing patients in need of intensive care is necessary to reduce the mortality rate during the COVID-19 pandemic. Although several scoring methods have been introduced, many require laboratory or radiographic findings that are not always easily available. OBJECTIVE: The purpose of this study was to develop a machine learning model that predicts the need for intensive care for patients with COVID-19 using easily obtainable characteristics-baseline demographics, comorbidities, and symptoms. METHODS: A retrospective study was performed using a nationwide cohort in South Korea. Patients admitted to 100 hospitals from January 25, 2020, to June 3, 2020, were included. Patient information was collected retrospectively by the attending physicians in each hospital and uploaded to an online case report form. Variables that could be easily provided were extracted. The variables were age, sex, smoking history, body temperature, comorbidities, activities of daily living, and symptoms. The primary outcome was the need for intensive care, defined as admission to the intensive care unit, use of extracorporeal life support, mechanical ventilation, vasopressors, or death within 30 days of hospitalization. Patients admitted until March 20, 2020, were included in the derivation group to develop prediction models using an automated machine learning technique. The models were externally validated in patients admitted after March 21, 2020. The machine learning model with the best discrimination performance was selected and compared against the CURB-65 (confusion, urea, respiratory rate, blood pressure, and 65 years of age or older) score using the area under the receiver operating characteristic curve (AUC). RESULTS: A total of 4787 patients were included in the analysis, of which 3294 were assigned to the derivation group and 1493 to the validation group. Among the 4787 patients, 460 (9.6%) patients needed intensive care. Of the 55 machine learning models developed, the XGBoost model revealed the highest discrimination performance. The AUC of the XGBoost model was 0.897 (95% CI 0.877-0.917) for the derivation group and 0.885 (95% CI 0.855-0.915) for the validation group. Both the AUCs were superior to those of CURB-65, which were 0.836 (95% CI 0.825-0.847) and 0.843 (95% CI 0.829-0.857), respectively. CONCLUSIONS: We developed a machine learning model comprising simple patient-provided characteristics, which can efficiently predict the need for intensive care among patients with COVID-19.


Subject(s)
COVID-19/epidemiology , Machine Learning/standards , COVID-19/mortality , Cohort Studies , Female , Humans , Male , Middle Aged , Prognosis , Retrospective Studies , Survival Analysis
13.
Int J Epidemiol ; 49(6): 1918-1929, 2021 01 23.
Article in English | MEDLINE | ID: covidwho-807732

ABSTRACT

BACKGROUND: Coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 infection, has been spreading globally. We aimed to develop a clinical model to predict the outcome of patients with severe COVID-19 infection early. METHODS: Demographic, clinical and first laboratory findings after admission of 183 patients with severe COVID-19 infection (115 survivors and 68 non-survivors from the Sino-French New City Branch of Tongji Hospital, Wuhan) were used to develop the predictive models. Machine learning approaches were used to select the features and predict the patients' outcomes. The area under the receiver operating characteristic curve (AUROC) was applied to compare the models' performance. A total of 64 with severe COVID-19 infection from the Optical Valley Branch of Tongji Hospital, Wuhan, were used to externally validate the final predictive model. RESULTS: The baseline characteristics and laboratory tests were significantly different between the survivors and non-survivors. Four variables (age, high-sensitivity C-reactive protein level, lymphocyte count and d-dimer level) were selected by all five models. Given the similar performance among the models, the logistic regression model was selected as the final predictive model because of its simplicity and interpretability. The AUROCs of the external validation sets were 0.881. The sensitivity and specificity were 0.839 and 0.794 for the validation set, when using a probability of death of 50% as the cutoff. Risk score based on the selected variables can be used to assess the mortality risk. The predictive model is available at [https://phenomics.fudan.edu.cn/risk_scores/]. CONCLUSIONS: Age, high-sensitivity C-reactive protein level, lymphocyte count and d-dimer level of COVID-19 patients at admission are informative for the patients' outcomes.


Subject(s)
COVID-19/diagnosis , COVID-19/mortality , Machine Learning/standards , Patient Admission/statistics & numerical data , SARS-CoV-2 , Aged , Case-Control Studies , Female , Hospitalization/statistics & numerical data , Hospitals , Humans , Male , Middle Aged , ROC Curve , Risk Assessment/methods , Risk Assessment/standards , Sensitivity and Specificity
SELECTION OF CITATIONS
SEARCH DETAIL